AITopics | source text

Collaborating Authors

source text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

Ke Wang, Hang Hua, Xiaojun Wan

Neural Information Processing SystemsFeb-12-2026, 20:36:33 GMT

Toaddresstheaboveproblems, wepropose amore flexible unsupervised textattributetransfer framework which replaces the process of modeling attribute with minimal editing of latent representations based on an attribute classifier.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Quantifying Cognitive Bias Induction in LLM-Generated Content

Alessa, Abeer, Somane, Param, Lakshminarasimhan, Akshaya, Skirzynski, Julian, McAuley, Julian, Echterhoff, Jessica

arXiv.org Artificial IntelligenceDec-2-2025

Large language models (LLMs) are integrated into applications like shopping reviews, summarization, or medical diagnosis support, where their use affects human decisions. We investigate the extent to which LLMs expose users to biased content and demonstrate its effect on human decision-making. We assess five LLM families in summarization and news fact-checking tasks, evaluating the consistency of LLMs with their context and their tendency to hallucinate on a new self-updating dataset. Our findings show that LLMs expose users to content that changes the context's sentiment in 26.42% of cases (framing bias), hallucinate on 60.33% of post-knowledge-cutoff questions, and highlight context from earlier parts of the prompt (primacy bias) in 10.12% of cases, averaged across all tested models. We further find that humans are 32% more likely to purchase the same product after reading a summary of the review generated by an LLM rather than the original review. To address these issues, we evaluate 18 mitigation methods across three LLM families and find the effectiveness of targeted interventions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.03194

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Challenging Multilingual LLMs: A New Taxonomy and Benchmark for Unraveling Hallucination in Translation

Wu, Xinwei, Liu, Heng, Zhou, Jiang, Zhao, Xiaohu, Xu, Linlong, Wang, Longyue, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial IntelligenceOct-29-2025

Large Language Models (LLMs) have advanced machine translation but remain vulnerable to hallucinations. Unfortunately, existing MT benchmarks are not capable of exposing failures in multilingual LLMs. To disclose hallucination in multilingual LLMs, we introduce a diagnostic framework with a taxonomy that separates Instruction Detachment from Source Detachment. Guided by this taxonomy, we create HalloMTBench, a multilingual, human-verified benchmark across 11 English-to-X directions. We employed 4 frontier LLMs to generate candidates and scrutinize these candidates with an ensemble of LLM judges, and expert validation. In this way, we curate 5,435 high-quality instances. We have evaluated 17 LLMs on HalloMTBench. Results reveal distinct ``hallucination triggers'' -- unique failure patterns reflecting model scale, source length sensitivity, linguistic biases, and Reinforcement-Learning (RL) amplified language mixing. HalloMTBench offers a forward-looking testbed for diagnosing LLM translation failures. HalloMTBench is available in https://huggingface.co/collections/AIDC-AI/marco-mt.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.24073

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning

Cao, Min, Zhou, Xinyu, Jiang, Ding, Du, Bo, Ye, Mang, Zhang, Min

arXiv.org Artificial IntelligenceOct-21-2025

Abstract--T ext-to-image person retrieval (TIPR) aims to identify the target person using textual descriptions, facing challenge in modality heterogeneity . Prior works have attempted to address it by developing cross-modal global or local alignment strategies. However, global methods typically overlook fine-grained cross-modal differences, whereas local methods require prior information to explore explicit part alignments. Additionally, current methods are English-centric, restricting their application in multilingual contexts. T o alleviate these issues, we pioneer a multilingual TIPR task by developing a multilingual TIPR benchmark, for which we leverage large language models for initial translations and refine them by integrating domain-specific knowledge. Correspondingly, we propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities. Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirectional prediction of masked image and text, implicitly enhancing the modeling of local relations across languages and modalities, a multi-dimensional global alignment module is integrated to bridge the modality heterogeneity . The proposed method achieves new state-of-the-art results on all multilingual TIPR datasets. The task is similar to the person re-identification task (Re-ID) [2], [3], [4], which involves identifying person images across cameras based on the image query . In contrast to the structured image query in Re-ID, the text query in TIPR takes the form of free, flexible characters, making it more accessible and offering substantial application potential in public safety domains. A key challenge in TIPR is the inherent modality gap between vision and language, driving research toward robust cross-modal alignment. The former aligns global text-image representations at the coarse-grained level via cross-modal matching loss functions (Figure 1(a)), while the latter establishes fine-grained associations between textual entities and image body parts (Figure 1(b)). Despite notable progress in this task, two critical issues remain to be addressed.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPAMI.2025.3620139

2510.17685

Country:

Asia > China (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

GAMBIT+: A Challenge Set for Evaluating Gender Bias in Machine Translation Quality Estimation Metrics

Filandrianos, Giorgos, Mastromichalakis, Orfeas Menis, Mohammed, Wafaa, Attanasio, Giuseppe, Zerva, Chrysoula

arXiv.org Artificial IntelligenceOct-9-2025

Gender bias in machine translation (MT) systems has been extensively documented, but bias in automatic quality estimation (QE) metrics remains comparatively underexplored. Existing studies suggest that QE metrics can also exhibit gender bias, yet most analyses are limited by small datasets, narrow occupational coverage, and restricted language variety. To address this gap, we introduce a large-scale challenge set specifically designed to probe the behavior of QE metrics when evaluating translations containing gender-ambiguous occupational terms. Building on the GAMBIT corpus of English texts with gender-ambiguous occupations, we extend coverage to three source languages that are genderless or natural-gendered, and eleven target languages with grammatical gender, resulting in 33 source-target language pairs. Each source text is paired with two target versions differing only in the grammatical gender of the occupational term(s) (masculine vs. feminine), with all dependent grammatical elements adjusted accordingly. An unbiased QE metric should assign equal or near-equal scores to both versions. The dataset's scale, breadth, and fully parallel design, where the same set of texts is aligned across all languages, enables fine-grained bias analysis by occupation and systematic comparisons across languages.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2510.06841

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > UAE (0.14)

Genre: Research Report > Experimental Study (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation

Dimino, Fabrizio, Arun, Abhinav, Sarmah, Bhaskarjit, Pasquali, Stefano

arXiv.org Artificial IntelligenceOct-8-2025

Large language models (LLMs) are increasingly being used to extract structured knowledge from unstructured financial text. Although prior studies have explored various extraction methods, there is no universal benchmark or unified evaluation framework for the construction of financial knowledge graphs (KG). We introduce FinReflectKG - EvalBench, a benchmark and evaluation framework for KG extraction from SEC 10-K filings. Building on the agentic and holistic evaluation principles of FinReflectKG - a financial KG linking audited triples to source chunks from S&P 100 filings and supporting single-pass, multi-pass, and reflection-agent-based extraction modes - EvalBench implements a deterministic commit-then-justify judging protocol with explicit bias controls, mitigating position effects, leniency, verbosity and world-knowledge reliance. Each candidate triple is evaluated with binary judgments of faithfulness, precision, and relevance, while comprehensiveness is assessed on a three-level ordinal scale (good, partial, bad) at the chunk level. Our findings suggest that, when equipped with explicit bias controls, LLM-as-Judge protocols provide a reliable and cost-efficient alternative to human annotation, while also enabling structured error analysis. Reflection-based extraction emerges as the superior approach, achieving best performance in comprehensiveness, precision, and relevance, while single-pass extraction maintains the highest faithfulness. By aggregating these complementary dimensions, FinReflectKG - EvalBench enables fine-grained benchmarking and bias-aware evaluation, advancing transparency and governance in financial AI applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.0571

Country: North America > United States (0.31)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

EditLens: Quantifying the Extent of AI Editing in Text

Thai, Katherine, Emi, Bradley, Masrour, Elyas, Iyyer, Mohit

arXiv.org Artificial IntelligenceOct-6-2025

A significant proportion of queries to large language models ask them to edit user-provided text, rather than generate new text from scratch. While previous work focuses on detecting fully AI-generated text, we demonstrate that AI-edited text is distinguishable from human-written and AI-generated text. First, we propose using lightweight similarity metrics to quantify the magnitude of AI editing present in a text given the original human-written text and validate these metrics with human annotators. Using these similarity metrics as intermediate supervision, we then train EditLens, a regression model that predicts the amount of AI editing present within a text. Our model achieves state-of-the-art performance on both binary (F1=94.7%) and ternary (F1=90.4%) classification tasks in distinguishing human, AI, and mixed writing. Not only do we show that AI-edited text can be detected, but also that the degree of change made by AI to human writing can be detected, which has implications for authorship attribution, education, and policy. Finally, as a case study, we use our model to analyze the effects of AI-edits applied by Grammarly, a popular writing assistance tool. To encourage further research, we commit to publicly releasing our models and dataset.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.03154

Country:

Europe (0.67)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Industry: Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

Ke Wang, Hang Hua, Xiaojun Wan

Neural Information Processing SystemsOct-3-2025, 04:03:22 GMT

Unsupervised text attribute transfer automatically transforms a text to alter a specific attribute (e.g.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

Xu, Wenda, Agrawal, Sweta, Zouhar, Vilém, Freitag, Markus, Deutsch, Daniel

arXiv.org Artificial IntelligenceOct-1-2025

As large language models (LLMs) begin to saturate existing benchmarks, automated benchmark creation using LLMs (LLM-as-a-benchmark) has emerged as a scalable alternative to slow and costly human curation. While these generated test sets have to potential to cheaply rank models, we demonstrate a critical flaw. LLM-generated benchmarks systematically favor the model that created the benchmark: they exhibit self-bias on low resource languages to English translation tasks. We show three key findings on automatic benchmarking of LLMs for translation: First, this bias originates from two sources: the generated test data (LLM-as-a-testset) and the evaluation method (LLM-as-an-evaluator), with their combination amplifying the effect. Second, self-bias in LLM-as-a-benchmark is heavily influenced by the model's generation capabilities in the source language. For instance, we observe more pronounced bias in into-English translation, where the model's generation system is developed, than in out-of-English translation tasks. Third, we observe that low diversity in source text is one attribution to self-bias. Our results suggest that improving the diversity of these generated source texts can mitigate some of the observed self-bias. The rapid advancements in Large Language Models (LLMs) have led to an unprecedented saturation of existing, meticulously human-curated benchmarks. This phenomenon exposes two critical, intertwined challenges: traditional benchmark creation is too laborious and expensive to keep pace with rapid model development, and this challenge is compounded by the inherent difficulty of constructing high-quality benchmarks for low-resource languages, even with human labor, which further strains existing benchmark resources.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.266

Country:

Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Peisakhovsky, Yehonatan, Gekhman, Zorik, Mass, Yosi, Ein-Dor, Liat, Reichart, Roi

arXiv.org Artificial IntelligenceSep-30-2025

Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark's difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model's parametric knowledge.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.22582

Country:

North America > United States (1.00)
Europe > United Kingdom > Wales (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Space Agency (0.93)
Government > Regional Government > North America Government > United States Government (0.93)
Law > Criminal Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback